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Please replace the Title as follows: 
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rnrrex pond i"* Annotations 



P lc « re pla C «*e taS tp a ra gIa ph 0 n P .2 to ougb.he, U rd M1 p W hon P .4w i «h t he 



following: 



Websites *. to^dU^Me.d.BioWomato 14(8)656-64, 1998) 
Wfc -^*-- MCinih^/c^ Provide a database of 

information about to ^notions of a., huoran genes that have an approved symbol, and , 
select others. Again this informal can only be accessed one g ene a, a ttme, and the 

DRAGON (Bouton CM et al, Bioinformatics 16(11)1038-9, 

numbers and demons ta te* forma, which Is no, linked to any of * a,— Tb. tool 

do es no, lot the researcher select entries from Ore keyword search. It does no, 

be™. andm^lUU ob tained fo m different keyword searches. MaresuUDRAGON 

include important databases like GenBank and LocusLu* that are the most commonly used 
cabases for searching candidate .ones. None of <hes. tools helps in eUminaun* sequence 
redondancies within the lis*. Databases like I*cusLmk and Genecards attempt to integrate the 
u„ iq »e characteristics from various daUbases and provide a broad summary on a smgle gene 
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taformauon from the able. Witt more and more gene -Mm * « 

!^ m of the probe PNA; b. Preparation of the probe PNA; c. Preparation of a 
ete nents and destgn of the pro* ^ J ^ ^ ^ rf 

and hybridized to fluorescent bdob p h ^ complex mathematical 

tVi*» vanned image to numbers, using compie* 
in conversion of the fluorescence of the scannea rn» 5 

corrections to extract signal torn background noise, e.g. 

Ger^ix(Wp*w**onxonr^^ 

< mBaa3Uiaas m*m r^ePirSQftwareJttaBandArrayV.s.on 

A***"*"**^^ Tnesenvmrbersindicatelevelof 
^^^^^^^^^^^^^^^^^Snva euli HMS Beagle: The BioMedNet 
^ Z JZo, Cluster TreeviewfEisenMBetaUProcNatiAcadSciUSA,^^ 
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80rttog on various «— fields and the expression level data. This approach cou.d be useful 

,o view any N» •» to "° 

compare .en.en.be.weendi.feren.^o^s 

TnXin, that fragments — — »- * 4 * Tr 

hybridization by blocking simple repetitive elements in genomic DNA, as shown in experiments 

Jsmdycross-hyb^.^ . , 

; ridLio, ARROGANT computttionally estimates die amount o f cross hybridan *. 

e ch sequence and tags potenual genes as possible candidates for cross hybridan. 

Severaicomputanonaltoolsandda^areavailablewMchrnaybeusedmUi. 

developmentofme code forworn withlarge^ecoUecnonsSomeoftbemared^nssed 

^ ta TpLo- PRIMO Oi * a«, O— *P> ™ W 7 > is a code ma, was deveioped 

,o design primers f „r large-scale DNA seeing prc^ts. P«MO designs P«* 

screes typically 20 bases long), which a* used to M» < 0 - 4 ^ 2 "» T 

sequences typicsiy „.„„iift,.«,ecificresion.PRIMOcanberunin 
PCR PRJMO can be made to design primers to amplify a specmc regio 
batchmodeandmeregionforu.edesignofpnmersforeach^uencecanbespecified 

Irately The parameters file (including parameters l*e oligo length, melting temperatures etc., 

.odehasb^successfullyusedtodcsi^primcrsformepas, couple of years and, savaHable on 
^webatfcm^ata^wme^SBP^^ 

important to.! to design primers to amphfy a large number of seouences simultaneously. 

2 BLAST - BLAST(Basic Local Alignment Search Tool, is an alignment tool to search 

Biology 215(3)4-3-10,1990). It is available a. ttto^ncbuuuuiui.tu ,«UW 

. mr 4 CTfo ARROGANT uses the BLAST output to estimate 
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♦ the arrav is BLASTed against the entire 
UniGene W- -d »• BLAST output » parsed to detect 
overlaps, used as a threshold for cross-hybri.fcat.on. 



Section l: Introduction to ARROGANT )arge 
ARROGANT is a database driven tool developed to compile, 

»mi KEGG Research Genetics and other custom databases have been 
g e„= coUectton, NCB1, KEGG, Res ^ ^ ^ ^ 

^«-^27J^ rrolatedJ—.The.ocal.rnp— ono £ 
extensively cover vanous rtemso^^ gUVST) makes ARROGANT independent of other 
vari ous databases and tools (e.g. PRMO fadUutes a(Jdltion 

mod « (fcto^antrswnre*^^ 
gene coUecnons mod. («P*xr*ar^^ 

resequence The analyse mode annotates larg g ^QGANT takes over where 

• r ^.r^rravs Whenusedforxrucroarrays, AKKUuad ^ 
cross-hybridizanon for mtcroarrays Wh ^ ^ 

^ or clustering of sequences toshes to provrde mrportan ^ ^ 

researchers to get a global view. ARROGANT has 

researcners » in v „ 

larg e number of gene collects < W ** m ^23^ stored in the database. This 
ABROGANT provides aweb based interface and hyperhnks vanous 8 
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three modes. 



». replay *. ft* «* P«^ h « » " ^ *" 



clone database (jWpto™ waWBUixom SSSEi 

above databases may be used. 



*• is* hrirtrine o 23 and 24 with the following: 
Please replace section 3.5.6 bridging p. « <" 



.„ vnrAOT uses a code called PRIMO available at 

"O**™™***' f^^VwMO - a ^ me „ P*- - 
am p 1 i & tasp eC ifieareg>o»ofm«c ra ..A S ROGANr ^ user modlf y 

• i t default =n Fig 8 shows a flowchart for pnmer design- 
reverse primers to select (default U- ^ ° » 



A dbvNIH is the biggest and the most used publicly available database (Nuclexc Ands 
provided by NIH, is the mgg rtVTrtaMv l0 897 000 sequence records as of 

v, ™nn Tan l-28(lV15-8). There are approximately 10.W » C H 
Research 2000 Jan 1 ,28(1).! nlm nihROV>). The complete release notes 

February 2001 ( iaip7/i»AiiailmJ>ite gw ^ttLffl^ajflBUllllJS^ 
Februarys fr ^ Ba ^ ^available at ftpt^^ 

forthecutxentversionofG^^ m .^aZis the single most important 

. 09/865,090 



PAGE 6116 * RCVD AT 121912003 2:47: 



c«o««42 SCIENCESTECHNOLOGY PAGE 87 

12/09/2003 11:45 6503434342 



^ase .o se^h for possible gene c—. Each G»Bank en*y has a ^ 

h „,»». ARROGANT H GenBank database in design and analysis mode. GenBank is 

^ormance, as the database U very large containing «« «^ -~ * 

' r n ARROGANT GenBank database 

database is implemented as a single table, see Fig U. ARROGANT OenB 

0bttm 6 ,, „ vwUh Us 0 ri 8 inal source on the web and downloads only the 

'getgb' compares files present locally with its original , , , 

<Jsm ^ M ^. The fles are unzipped, combined into . «. ^ 
-f^Ie^nXeTual sizes and men reformatted and c» men be dtrectly imported 

into the database using the Tiulk taserf script. 

4 1 2 UniGene: UniGene partitions GenBank EST sequences into a non-redundan. set of 
, gene-oriented clusters. Each UniGene cluster contains sequences that presumably represent a 

JJd and map location. The UniGene database was chosen to be a part of ARROGANT (see 
expressed anom p ARROGANT uses UniGene database to 

Fig l4)fortherollowingreasons: 1. Avoid Redundancy. ARROOAN1 uses 

representing the same UniGene Custer. ARROGANT uses mis ,n the merging gene coU««n 
mode to combine different lists into one unique collection. 2, The UniGene ^.abase me ud« 

Additional Annotation: Provides additional annotation for a given gene sequence, e.g. cDNA 
lee whichisusedto look for keywords (design mode, and annotate gene coU«hon(analys. 
rncKle). As a result UniGene database is used in all the three modes by ARROGANT- Perl scripts 
combine similar files ( l lp.» ' .i c ln.uln..uih. l i u . /. rf . -itrir yUrnW 

the import function in SQL Server 7.0. 
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4 , 3 U-L* LocusLink is NCBI, a«emp, to - P-de a smgle 

„p*uiuesby ta corporatingLocus.^^^ , 

Z»* accessions for a locus, - we« as a new type, .he NCBI Reference Sequence (Ref^)- 
^ provdes a reference sequence fox each .ocus cluster. tocusUnk -abase - «- * 
^OGANT in the design and a^lysis mode, see Fig .5. Series of VisualBastc cables 
import flies into the database, downloaded from NCBI 
(ar ^u^i.uUu.uU^^ ''^ ^ f '- r "'» Li '' lutX -' m '' ' 
-ffirffn-W i'" ™ h '"" mT inK/T - L 

4 1 4 KEGG Genome and Pathway Database: ARROGANT no, only combines different 
Phases from NCBI but a.so uses the KEGG daWbase, Kyoto Bacyclopedta of Genes and 
Genomes (KEGG) makes available, information pathways consisting o, interact mo.ecu es or 

At ^p~^*^^^*^r^ h ^n^ri 

X AjROGANTtolookforkeywordsandamtoUtegencsequences.AsaresultKEGG M-< 

tables. A file containing additional pathway information ,s used 

(Hp.//ln.a t ,.ftuiu]iiL..aJ.jli/pa l h«a^uiapJitl et 'h 

.■ftp Hlmpg eerrrf ^ ipfoathwaWman titlc.tab>). 

4 , 5 HomoloGeae- The HomoloGene database provides homologs /ormologs, which is used as a 
wrt annotationof large gene collection by ft..** mode, seeFig 17. tt primaries 
me UniGene cluster identifier to search for homologs / orthologs. Accession numbers and 
^cusLink idlers may also be used. HomotoGene uses nudsoflde sequence chansons to 

HomoloGene database is downloaded from f ^eb«n«*gu,, U uu,Uuu,u, U Q.u. 1 , ll ,l^ r 
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^^,^^^^5,,^^ Perl scripts forma, the goaded file, 

the file into the database. 

Cones ten, the IMAGE consortium. The catalog of Cones available a, Research Genetic, can be 

,U ^oadedattV/ftp ^^ . 

K* ^mWO Hl^UflUaOae.Theca.alogcon.a.ns 

' annotation related ,o to clones tike accession number, gene name, cluster ID, tnser. srze, 

mates, etc. ARROGANT stores this catalog locally in .he database, which is used to find 
commercially available clones and search for candidate genes inthe design mode, see F,g 18. 



6 3 ARROGANT used for identifying and arorotatmg genes for polymorphism discovery to hnk 
,„ cardiac diseases for PGA: The Program for Genomic Applicant (PGA) is a nationw.de 
attempt to use genomic and proteomic methods to study and investigate cellular responses ,o 
. imury and innammation. The program endeavors to identify * genes and proteins involved m 

*\ teerespouses.AraOGANTwasusedtobomre^^ 

weUasannou,.e*ecurren«PGAHs.of2 53 genes.Theabilit y ofARROGANT,o fl ndpo,en«al 

candidates was tested by comparing the lis. obtained using keyword search wrtt, the current M of 
genes The list of keywords compiled by researchers participating in PGA was as follows: 
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hyperlipidemia 
low density lipoproteins 
dietary responsiveness 
high density lipoproteins 
coronary calcification 
insulin resistance 
cardiac hypertrophy 
coronary artery disease 
coronary atherosclerosis 



arteriosclerosis 

cholesterol 

irjilarjnanation 

cytokine 
orphan receptor 
cardiac failure 
signal transduction 
G-protein 



ARROGANT found 3,789 genes associated with the above keywords. There were 13 genes 
found in_withthecu r rentPGAH S tof253 genes. This demonstrated the keyword se^ch 
capabiluyofARROGA^toloo^^ 

compiled list was annotated using the analysis mode and is available on the web at: 

AW«»ANT was 

also used to annotate the current PGA list of 253 genes. 

The ability of ARROGANT in the analysis mode to accept a hst of genes tab delimited 

priority and 0- Low priority. The annotated table is available on the web at: 

^■//ARROGANX swme^^ 

6.4 ARROGANT used in the study of Robert's Syndrome: Robert's Syndrome U a genetic 
disorder caused by chromosome damage during cell dmsion, ^d characterized by loss of limb 
bones, cleft palate, heart defects and abnormalities of the abdominal organs. ARROGANT was 
used to find new potential candidate genes for Robert's syndrome using keywords: 
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Robert syndrome 
Roberts syndrome 
Robert's syndrome 
Pseudothalidomide syndrome 
SC phocomelia syndrome 
heterochromatin 
Heterochromatic repulsion 
Heterochromatic splaying 
Premature centromere separation 
premature separation 
Tetraphocomelia 
Limb reduction 
hypoplastic 
Long bone 
Aneuploidy 
Craniofacial 
Oxycephalic 
aplasia of the fibula 
bilateral clubfoot 
absence of radii 
cleft lip and palate 
oligodactyly 
microcephaly 
exophthalmus 
hypertelorism 

1 corneal clouding 
hemangiomas 



hypoplastic nasal and auricular cartilage 

atrial septal defect 

patent ductus arteriosus 

polycystic kidneys 

fused kidneys 

horseshoe kidneys 

micronucleation 

enlargement of the phallus 

absent nails 

ICF syndrome 

Centromenc instability immunodeficiency 
syndrome 
MECP2 

Methyl binding protein 
Hypomethylation 
Hypermethylation 
Demethylation 
demethyltransferase 

Methylation 
methylase 
mSIN3A 
Histone 

Histone acetylation 
Histone acetylase 
Histone deacetylase 
TAR syndrome 



ARROGANT found 6,326 genes : 



which were further annotated using the analysis mode. The 
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results are available on the web at; 




results are avauaoic vm ^ - ~ v ~. 
^^^^^*^of Robert Sy^me.asobtamed.The 
tberewasone gene in — between the two lists. ™s a^„ demo^ the utbty of 
^ was also annotated using ». analysis mode and the results are available on the web 

„ ARROGANT used to annotate genes on commercial DNA chips: ARKOOAKT w^usedin 
fc analysis mode ,0 annotate various microarrays available from A«yme«x (Santa Clara, CA) 

Illps in ring taporta,, observations. Tbe foUowmg _ial (AHyme™, human and 
annotated list is available on the web at 

^n«p.///M^ — -7 genome. The results 

2. Rat RG-U34 microarray: This consists of 1 ,322 genes trom jw g 



are available on the web at 
<httpj//ARRQGANLjwrne^ 

6 6 ARROGANT used to annotate genes on chromosome 3p: ARROGANT was used to identify 
senes commonly mutated orwhose expression is deregulated in human lung and breast cancers, 
genes common y chromosomes it was observed that allele loss 

Although several regions of loss occur on multiple chromosomes 

nre-malienant change so far detected in lung 
in the chromosome 3p21.3 area was the earliest pre malignant nans 
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caacer development (ht ^ a 
j^- ^ c ^s_,^mJ^M^. ARROGANT was used to 

annotate the 32 genes on chromosome 3p thought to be important in causing lung cancer. The ^ 
results are available at: http ^ UUAl^l . ^m , d Wmy , rH bUc^L^.^ ^ 
^■//AR^OGA ^— * ^,1/m.^/hideandsort.asp^t array=40357> . 

6 7 ARROGANT used to analyze human microarrays: Our laboratory has developed a human 
cDNA microarray, which consists of 10,000 clones from Research Genetics. Many laboratory 
in UTSW (University of Texas Southwestern Medical Center at Dallas) are using this nncroarray 
for various research studies like cancer, aging, etc. ARROGANT provides annotation for all the 
genes as one table. The researchers can overlay their expression level data on tins table, wmch 
would help them make important observations. Por example, the researcher could look at the 
pathways for all the highly expressed genes and also know their position in the genome. Further 
the researcher could also sort the data using ARROGANT to bring the interesting genes on top of 
the table. ARROGANT annotation of the human 10,000 array is available on the web at 
litlu //ARROGANT.^ h mod.Lduto^ v Httoteaiidbui U*>y ? ULuaay 001 1 0 

ARROGANT also 

annotated our earlier human array consisting of 4,200 elements and the results arc avauable at 
<*tt p:// ARROGANT RW med.e^ Wweh/hideandvort asr^txt array=60718>. 
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